Dataset statistics
| Number of variables | 8 |
|---|---|
| Number of observations | 785 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 44.5 KiB |
| Average record size in memory | 58.1 B |
Variable types
| Categorical | 1 |
|---|---|
| Numeric | 7 |
numcol is highly correlated with totalprod and 2 other fields | High correlation |
totalprod is highly correlated with numcol and 2 other fields | High correlation |
stocks is highly correlated with numcol and 2 other fields | High correlation |
prodvalue is highly correlated with numcol and 2 other fields | High correlation |
numcol is highly correlated with totalprod and 2 other fields | High correlation |
yieldpercol is highly correlated with totalprod | High correlation |
totalprod is highly correlated with numcol and 3 other fields | High correlation |
stocks is highly correlated with numcol and 2 other fields | High correlation |
prodvalue is highly correlated with numcol and 2 other fields | High correlation |
numcol is highly correlated with totalprod and 2 other fields | High correlation |
totalprod is highly correlated with numcol and 2 other fields | High correlation |
stocks is highly correlated with numcol and 2 other fields | High correlation |
prodvalue is highly correlated with numcol and 2 other fields | High correlation |
totalprod is highly correlated with prodvalue and 4 other fields | High correlation |
prodvalue is highly correlated with totalprod and 3 other fields | High correlation |
state is highly correlated with totalprod and 4 other fields | High correlation |
yieldpercol is highly correlated with totalprod and 2 other fields | High correlation |
numcol is highly correlated with totalprod and 3 other fields | High correlation |
priceperlb is highly correlated with year | High correlation |
year is highly correlated with priceperlb | High correlation |
stocks is highly correlated with totalprod and 4 other fields | High correlation |
Reproduction
| Analysis started | 2021-06-12 17:17:41.960933 |
|---|---|
| Analysis finished | 2021-06-12 17:17:52.089456 |
| Duration | 10.13 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 44 |
|---|---|
| Distinct (%) | 5.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.3 KiB |
| Indiana | 19 |
|---|---|
| Montana | 19 |
| Pennsylvania | 19 |
| North Dakota | 19 |
| South Dakota | 19 |
| Other values (39) |
Length
| Max length | 14 |
|---|---|
| Median length | 8 |
| Mean length | 8.080254777 |
| Min length | 4 |
Characters and Unicode
| Total characters | 6343 |
|---|---|
| Distinct characters | 45 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Alabama |
|---|---|
| 2nd row | Arizona |
| 3rd row | Arkansas |
| 4th row | California |
| 5th row | Colorado |
Common Values
| Value | Count | Frequency (%) |
| Indiana | 19 | 2.4% |
| Montana | 19 | 2.4% |
| Pennsylvania | 19 | 2.4% |
| North Dakota | 19 | 2.4% |
| South Dakota | 19 | 2.4% |
| Utah | 19 | 2.4% |
| Missouri | 19 | 2.4% |
| Michigan | 19 | 2.4% |
| Mississippi | 19 | 2.4% |
| Illinois | 19 | 2.4% |
| Other values (34) | 595 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| new | 53 | 5.8% |
| virginia | 38 | 4.1% |
| north | 38 | 4.1% |
| dakota | 38 | 4.1% |
| carolina | 25 | 2.7% |
| south | 25 | 2.7% |
| florida | 19 | 2.1% |
| arkansas | 19 | 2.1% |
| west | 19 | 2.1% |
| pennsylvania | 19 | 2.1% |
| Other values (35) | 627 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 837 | |
| i | 686 | 10.8% |
| n | 582 | 9.2% |
| o | 546 | 8.6% |
| s | 437 | 6.9% |
| e | 383 | 6.0% |
| r | 335 | 5.3% |
| t | 234 | 3.7% |
| l | 170 | 2.7% |
| h | 164 | 2.6% |
| Other values (35) | 1969 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 5288 | |
| Uppercase Letter | 920 | 14.5% |
| Space Separator | 135 | 2.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 837 | |
| i | 686 | |
| n | 582 | |
| o | 546 | |
| s | 437 | |
| e | 383 | |
| r | 335 | 6.3% |
| t | 234 | 4.4% |
| l | 170 | 3.2% |
| h | 164 | 3.1% |
| Other values (14) | 914 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 135 | |
| N | 121 | |
| I | 76 | 8.3% |
| W | 76 | 8.3% |
| C | 63 | 6.8% |
| A | 57 | 6.2% |
| V | 57 | 6.2% |
| O | 44 | 4.8% |
| K | 38 | 4.1% |
| D | 38 | 4.1% |
| Other values (10) | 215 |
Space Separator
| Value | Count | Frequency (%) |
| 135 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6208 | |
| Common | 135 | 2.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 837 | |
| i | 686 | 11.1% |
| n | 582 | 9.4% |
| o | 546 | 8.8% |
| s | 437 | 7.0% |
| e | 383 | 6.2% |
| r | 335 | 5.4% |
| t | 234 | 3.8% |
| l | 170 | 2.7% |
| h | 164 | 2.6% |
| Other values (34) | 1834 |
Common
| Value | Count | Frequency (%) |
| 135 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6343 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 837 | |
| i | 686 | 10.8% |
| n | 582 | 9.2% |
| o | 546 | 8.6% |
| s | 437 | 6.9% |
| e | 383 | 6.0% |
| r | 335 | 5.3% |
| t | 234 | 3.7% |
| l | 170 | 2.7% |
| h | 164 | 2.6% |
| Other values (35) | 1969 |
| Distinct | 164 |
|---|---|
| Distinct (%) | 20.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 61686.6242 |
| Minimum | 2000 |
|---|---|
| Maximum | 510000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 2000 |
|---|---|
| 5-th percentile | 5000 |
| Q1 | 9000 |
| median | 26000 |
| Q3 | 65000 |
| 95-th percentile | 269000 |
| Maximum | 510000 |
| Range | 508000 |
| Interquartile range (IQR) | 56000 |
Descriptive statistics
| Standard deviation | 92748.94046 |
|---|---|
| Coefficient of variation (CV) | 1.50355027 |
| Kurtosis | 7.675632202 |
| Mean | 61686.6242 |
| Median Absolute Deviation (MAD) | 19000 |
| Skewness | 2.724034252 |
| Sum | 48424000 |
| Variance | 8602365956 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 7000 | 51 | 6.5% |
| 6000 | 37 | 4.7% |
| 8000 | 35 | 4.5% |
| 9000 | 30 | 3.8% |
| 5000 | 27 | 3.4% |
| 10000 | 26 | 3.3% |
| 11000 | 20 | 2.5% |
| 14000 | 18 | 2.3% |
| 4000 | 17 | 2.2% |
| 12000 | 15 | 1.9% |
| Other values (154) | 509 |
| Value | Count | Frequency (%) |
| 2000 | 1 | 0.1% |
| 3000 | 8 | 1.0% |
| 4000 | 17 | 2.2% |
| 5000 | 27 | |
| 6000 | 37 | |
| 7000 | 51 | |
| 8000 | 35 | |
| 9000 | 30 | |
| 10000 | 26 | |
| 11000 | 20 | 2.5% |
| Value | Count | Frequency (%) |
| 510000 | 1 | 0.1% |
| 490000 | 2 | |
| 485000 | 1 | 0.1% |
| 480000 | 3 | |
| 470000 | 1 | 0.1% |
| 465000 | 1 | 0.1% |
| 460000 | 2 | |
| 450000 | 2 | |
| 440000 | 1 | 0.1% |
| 420000 | 1 | 0.1% |
| Distinct | 98 |
|---|---|
| Distinct (%) | 12.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 60.57834395 |
| Minimum | 19 |
|---|---|
| Maximum | 136 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 19 |
|---|---|
| 5-th percentile | 34 |
| Q1 | 46 |
| median | 58 |
| Q3 | 72 |
| 95-th percentile | 96 |
| Maximum | 136 |
| Range | 117 |
| Interquartile range (IQR) | 26 |
Descriptive statistics
| Standard deviation | 19.4278306 |
|---|---|
| Coefficient of variation (CV) | 0.3207058717 |
| Kurtosis | 0.5848019806 |
| Mean | 60.57834395 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 0.7464778953 |
| Sum | 47554 |
| Variance | 377.4406018 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 50 | 25 | 3.2% |
| 46 | 24 | 3.1% |
| 51 | 23 | 2.9% |
| 60 | 21 | 2.7% |
| 48 | 21 | 2.7% |
| 52 | 20 | 2.5% |
| 66 | 18 | 2.3% |
| 55 | 18 | 2.3% |
| 70 | 17 | 2.2% |
| 61 | 17 | 2.2% |
| Other values (88) | 581 |
| Value | Count | Frequency (%) |
| 19 | 1 | 0.1% |
| 20 | 1 | 0.1% |
| 21 | 1 | 0.1% |
| 22 | 1 | 0.1% |
| 23 | 1 | 0.1% |
| 26 | 3 | |
| 27 | 4 | |
| 28 | 2 | |
| 29 | 1 | 0.1% |
| 30 | 4 |
| Value | Count | Frequency (%) |
| 136 | 1 | |
| 131 | 1 | |
| 128 | 1 | |
| 124 | 1 | |
| 122 | 1 | |
| 121 | 1 | |
| 118 | 2 | |
| 116 | 1 | |
| 115 | 2 | |
| 114 | 2 |
| Distinct | 625 |
|---|---|
| Distinct (%) | 79.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4140956.688 |
| Minimum | 84000 |
|---|---|
| Maximum | 46410000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 84000 |
|---|---|
| 5-th percentile | 231600 |
| Q1 | 470000 |
| median | 1500000 |
| Q3 | 4096000 |
| 95-th percentile | 19856000 |
| Maximum | 46410000 |
| Range | 46326000 |
| Interquartile range (IQR) | 3626000 |
Descriptive statistics
| Standard deviation | 6884593.859 |
|---|---|
| Coefficient of variation (CV) | 1.662561185 |
| Kurtosis | 9.657821906 |
| Mean | 4140956.688 |
| Median Absolute Deviation (MAD) | 1176000 |
| Skewness | 2.991733525 |
| Sum | 3250651000 |
| Variance | 4.73976326 × 1013 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 280000 | 6 | 0.8% |
| 408000 | 6 | 0.8% |
| 336000 | 5 | 0.6% |
| 288000 | 5 | 0.6% |
| 324000 | 5 | 0.6% |
| 276000 | 5 | 0.6% |
| 260000 | 4 | 0.5% |
| 770000 | 4 | 0.5% |
| 330000 | 4 | 0.5% |
| 385000 | 4 | 0.5% |
| Other values (615) | 737 |
| Value | Count | Frequency (%) |
| 84000 | 1 | |
| 120000 | 1 | |
| 123000 | 1 | |
| 136000 | 1 | |
| 138000 | 1 | |
| 141000 | 1 | |
| 150000 | 2 | |
| 153000 | 1 | |
| 156000 | 2 | |
| 159000 | 1 |
| Value | Count | Frequency (%) |
| 46410000 | 1 | |
| 42140000 | 1 | |
| 37830000 | 1 | |
| 37350000 | 1 | |
| 36260000 | 1 | |
| 36000000 | 1 | |
| 34650000 | 1 | |
| 34500000 | 1 | |
| 33670000 | 1 | |
| 33120000 | 2 |
| Distinct | 584 |
|---|---|
| Distinct (%) | 74.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1257629.299 |
| Minimum | 8000 |
|---|---|
| Maximum | 13800000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 8000 |
|---|---|
| 5-th percentile | 41200 |
| Q1 | 119000 |
| median | 391000 |
| Q3 | 1380000 |
| 95-th percentile | 5938800 |
| Maximum | 13800000 |
| Range | 13792000 |
| Interquartile range (IQR) | 1261000 |
Descriptive statistics
| Standard deviation | 2211793.817 |
|---|---|
| Coefficient of variation (CV) | 1.758700929 |
| Kurtosis | 11.70453917 |
| Mean | 1257629.299 |
| Median Absolute Deviation (MAD) | 332000 |
| Skewness | 3.275719046 |
| Sum | 987239000 |
| Variance | 4.892031889 × 1012 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 92000 | 6 | 0.8% |
| 69000 | 6 | 0.8% |
| 95000 | 5 | 0.6% |
| 104000 | 5 | 0.6% |
| 152000 | 4 | 0.5% |
| 151000 | 4 | 0.5% |
| 189000 | 4 | 0.5% |
| 52000 | 4 | 0.5% |
| 86000 | 4 | 0.5% |
| 106000 | 4 | 0.5% |
| Other values (574) | 739 |
| Value | Count | Frequency (%) |
| 8000 | 1 | 0.1% |
| 12000 | 2 | |
| 13000 | 1 | 0.1% |
| 14000 | 2 | |
| 17000 | 2 | |
| 19000 | 1 | 0.1% |
| 21000 | 3 | |
| 23000 | 1 | 0.1% |
| 24000 | 1 | 0.1% |
| 25000 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 13800000 | 1 | |
| 13545000 | 1 | |
| 13046000 | 1 | |
| 12995000 | 1 | |
| 12796000 | 1 | |
| 12326000 | 1 | |
| 12220000 | 1 | |
| 12127000 | 1 | |
| 11970000 | 1 | |
| 11818000 | 1 |
| Distinct | 273 |
|---|---|
| Distinct (%) | 34.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.695159236 |
| Minimum | 0.49 |
|---|---|
| Maximum | 7.09 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 0.49 |
|---|---|
| 5-th percentile | 0.62 |
| Q1 | 1.05 |
| median | 1.48 |
| Q3 | 2.04 |
| 95-th percentile | 3.658 |
| Maximum | 7.09 |
| Range | 6.6 |
| Interquartile range (IQR) | 0.99 |
Descriptive statistics
| Standard deviation | 0.930623423 |
|---|---|
| Coefficient of variation (CV) | 0.5489887932 |
| Kurtosis | 3.437928907 |
| Mean | 1.695159236 |
| Median Absolute Deviation (MAD) | 0.51 |
| Skewness | 1.568036977 |
| Sum | 1330.7 |
| Variance | 0.8660599555 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1.41 | 10 | 1.3% |
| 1.42 | 10 | 1.3% |
| 1.4 | 9 | 1.1% |
| 0.59 | 9 | 1.1% |
| 0.65 | 9 | 1.1% |
| 0.72 | 9 | 1.1% |
| 0.68 | 8 | 1.0% |
| 1.96 | 8 | 1.0% |
| 0.64 | 8 | 1.0% |
| 1.43 | 8 | 1.0% |
| Other values (263) | 697 |
| Value | Count | Frequency (%) |
| 0.49 | 1 | 0.1% |
| 0.52 | 2 | 0.3% |
| 0.53 | 2 | 0.3% |
| 0.54 | 2 | 0.3% |
| 0.55 | 2 | 0.3% |
| 0.56 | 1 | 0.1% |
| 0.57 | 5 | |
| 0.58 | 3 | 0.4% |
| 0.59 | 9 | |
| 0.6 | 7 |
| Value | Count | Frequency (%) |
| 7.09 | 1 | |
| 5.85 | 1 | |
| 5.53 | 1 | |
| 5.43 | 1 | |
| 5.42 | 1 | |
| 4.99 | 1 | |
| 4.89 | 1 | |
| 4.88 | 1 | |
| 4.78 | 1 | |
| 4.68 | 1 |
| Distinct | 733 |
|---|---|
| Distinct (%) | 93.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5489738.854 |
| Minimum | 162000 |
|---|---|
| Maximum | 83859000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 162000 |
|---|---|
| 5-th percentile | 377000 |
| Q1 | 901000 |
| median | 2112000 |
| Q3 | 5559000 |
| 95-th percentile | 23121200 |
| Maximum | 83859000 |
| Range | 83697000 |
| Interquartile range (IQR) | 4658000 |
Descriptive statistics
| Standard deviation | 9425393.878 |
|---|---|
| Coefficient of variation (CV) | 1.716911155 |
| Kurtosis | 20.34189798 |
| Mean | 5489738.854 |
| Median Absolute Deviation (MAD) | 1469000 |
| Skewness | 3.960801547 |
| Sum | 4309445000 |
| Variance | 8.883804976 × 1013 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 736000 | 3 | 0.4% |
| 2122000 | 3 | 0.4% |
| 440000 | 3 | 0.4% |
| 651000 | 3 | 0.4% |
| 590000 | 3 | 0.4% |
| 259000 | 3 | 0.4% |
| 2552000 | 2 | 0.3% |
| 1361000 | 2 | 0.3% |
| 845000 | 2 | 0.3% |
| 1256000 | 2 | 0.3% |
| Other values (723) | 759 |
| Value | Count | Frequency (%) |
| 162000 | 1 | |
| 173000 | 1 | |
| 174000 | 1 | |
| 179000 | 1 | |
| 186000 | 1 | |
| 210000 | 1 | |
| 221000 | 1 | |
| 235000 | 1 | |
| 238000 | 1 | |
| 249000 | 1 |
| Value | Count | Frequency (%) |
| 83859000 | 1 | |
| 69986000 | 1 | |
| 69615000 | 1 | |
| 67565000 | 1 | |
| 65268000 | 1 | |
| 63590000 | 1 | |
| 54542000 | 1 | |
| 50669000 | 1 | |
| 48960000 | 1 | |
| 47817000 | 1 |
| Distinct | 19 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2006.817834 |
| Minimum | 1998 |
|---|---|
| Maximum | 2016 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 1998 |
|---|---|
| 5-th percentile | 1998 |
| Q1 | 2002 |
| median | 2007 |
| Q3 | 2012 |
| 95-th percentile | 2015.8 |
| Maximum | 2016 |
| Range | 18 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 5.491522957 |
|---|---|
| Coefficient of variation (CV) | 0.002736433204 |
| Kurtosis | -1.212078911 |
| Mean | 2006.817834 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 0.04866639741 |
| Sum | 1575352 |
| Variance | 30.15682439 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=19)
| Value | Count | Frequency (%) |
| 2001 | 44 | 5.6% |
| 2002 | 44 | 5.6% |
| 2003 | 44 | 5.6% |
| 1998 | 43 | 5.5% |
| 2000 | 43 | 5.5% |
| 1999 | 43 | 5.5% |
| 2006 | 41 | 5.2% |
| 2008 | 41 | 5.2% |
| 2007 | 41 | 5.2% |
| 2005 | 41 | 5.2% |
| Other values (9) | 360 |
| Value | Count | Frequency (%) |
| 1998 | 43 | |
| 1999 | 43 | |
| 2000 | 43 | |
| 2001 | 44 | |
| 2002 | 44 | |
| 2003 | 44 | |
| 2004 | 41 | |
| 2005 | 41 | |
| 2006 | 41 | |
| 2007 | 41 |
| Value | Count | Frequency (%) |
| 2016 | 40 | |
| 2015 | 40 | |
| 2014 | 40 | |
| 2013 | 39 | |
| 2012 | 40 | |
| 2011 | 40 | |
| 2010 | 40 | |
| 2009 | 40 | |
| 2008 | 41 | |
| 2007 | 41 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| state | numcol | yieldpercol | totalprod | stocks | priceperlb | prodvalue | year | |
|---|---|---|---|---|---|---|---|---|
| 0 | Alabama | 16000.000 | 71 | 1136000.000 | 159000.000 | 0.720 | 818000.000 | 1998 |
| 1 | Arizona | 55000.000 | 60 | 3300000.000 | 1485000.000 | 0.640 | 2112000.000 | 1998 |
| 2 | Arkansas | 53000.000 | 65 | 3445000.000 | 1688000.000 | 0.590 | 2033000.000 | 1998 |
| 3 | California | 450000.000 | 83 | 37350000.000 | 12326000.000 | 0.620 | 23157000.000 | 1998 |
| 4 | Colorado | 27000.000 | 72 | 1944000.000 | 1594000.000 | 0.700 | 1361000.000 | 1998 |
| 5 | Florida | 230000.000 | 98 | 22540000.000 | 4508000.000 | 0.640 | 14426000.000 | 1998 |
| 6 | Georgia | 75000.000 | 56 | 4200000.000 | 307000.000 | 0.690 | 2898000.000 | 1998 |
| 7 | Hawaii | 8000.000 | 118 | 944000.000 | 66000.000 | 0.770 | 727000.000 | 1998 |
| 8 | Idaho | 120000.000 | 50 | 6000000.000 | 2220000.000 | 0.650 | 3900000.000 | 1998 |
| 9 | Illinois | 9000.000 | 71 | 639000.000 | 204000.000 | 1.190 | 760000.000 | 1998 |
Last rows
| state | numcol | yieldpercol | totalprod | stocks | priceperlb | prodvalue | year | |
|---|---|---|---|---|---|---|---|---|
| 775 | South Dakota | 280000.000 | 71 | 19880000.000 | 12127000.000 | 1.760 | 34989000.000 | 2016 |
| 776 | Tennessee | 6000.000 | 55 | 330000.000 | 69000.000 | 4.880 | 1610000.000 | 2016 |
| 777 | Texas | 133000.000 | 70 | 9310000.000 | 2607000.000 | 2.080 | 19365000.000 | 2016 |
| 778 | Utah | 31000.000 | 32 | 992000.000 | 169000.000 | 1.930 | 1915000.000 | 2016 |
| 779 | Vermont | 6000.000 | 52 | 312000.000 | 69000.000 | 3.640 | 1136000.000 | 2016 |
| 780 | Virginia | 5000.000 | 38 | 190000.000 | 30000.000 | 5.850 | 1112000.000 | 2016 |
| 781 | Washington | 84000.000 | 35 | 2940000.000 | 412000.000 | 1.990 | 5851000.000 | 2016 |
| 782 | West Virginia | 5000.000 | 32 | 160000.000 | 43000.000 | 3.920 | 627000.000 | 2016 |
| 783 | Wisconsin | 54000.000 | 62 | 3348000.000 | 1205000.000 | 2.670 | 8939000.000 | 2016 |
| 784 | Wyoming | 40000.000 | 68 | 2720000.000 | 190000.000 | 1.780 | 4842000.000 | 2016 |